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ABSTRACT 

The major distinctions between evaluation and 
research are examined, the chief differences being the intent ani 
type of criteria against whish judgments are Bade. Conceptualization 
of the evaluation process in higher education is discussed on two 
levels. A collection of nine similes for understanding evaluation is 
examined .n terms of major activities, advantages, and disadvantages: 
evaluation as: (1) measurement; U) expert judgment of worth: (3) 
assessment between performance and objectives: (4) the basis for 
decisions; (5) a goal-free process; (6) conflict -resolution; (7) 
complacency reduction; (8) a change agent; and (9) ritual. 
Consideration is given to tnree types of formal evaluation models: 
the experimental, ecological,, and eclectic approaches. The program 
evaluation process considered most restricting, that built into the 
buaget process, is examined in detail. An investigation is also made 
of the purpose and practice of evaluation according to organizational 
xeyei; state legislative audit, review by a state coordinating board, 
arlticampus scrutiny, campus program evaluation, accrediting review, 
and departmental study. Factors that affect usefulness of program 
evaluation reports— such as timing, comparisons to similar units, and 
format--are discussed. The Florida state University system is 
described. Speculation about the future of the practice is made by 
examining present practices in diverse policy areas: nor.traditionil 
delivery systems, government revenue reduction schemes (taxpayer 
advocacy), management, regulation of professions, and consumer 
protection. (MSE) 
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Foreword 



One aspect ot "accountability 1 is to demonstrate that an activity or 
program is fulfilling its stated purposes. One result of decreasing in- 
tome is to examine moie closely the activities and programs of an or- 
ganisation to see if thev are efficiently performed, effective, and need- 
ed. Thus the increased demand for accountability and the pressures 
of limited resources have greatly increased the need for program 
evaluation. 

When program evaluation is suggested, it is often greeted with 
skepticism or apprehension. In iis extreme, program evaluation is seen 
as a process to either legitimize an activity pr to develop a rationale 
to cut back or eliminate. Obviously, there is a large middle; ground 
for the use and results of program c\aluaiion. fts success depends on 
how well the process is thought out. how accurately the data Is 
gathered, and how hones! I v it is analyzed. 

Before progiam evaluation can begin, there first must be some 
basic understanding concerning what is meant by evaluation and 
knowledge concerning the various evaluation procedures and tech- 
niques that arc axailablc. This research report by Charles E. Feasley, 
Coordinator of Operational Services. Extended Learning Institute, at 
Northern Virginia Community College, reviews and analyzes the 
major literature concerned with program evaluation. Alter discussing 
what is meant by program evaluation and describing nine ways that 
program evaluation is used. Di. Feasley examines the various models 
that underlie all program evaluation aftiviiy. It is anticipated that 
this report will help to establish a more logical foundation for pro- 
gram evaluation and. when shared with all the parties involved, will 
help to develop plans of program evaluation that will encourage co- 
operation and minimize anxieties. 

Jonathan H. Fife, Director 
ERIC i<S> Clearinghouse on Higher Education 
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Overview 
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experimental and ecological models. Cleat l\. it is important that an 
evaluate* and lonsumei ol each evaluation pinpoint all the purposes 
tor a given sludv as a prelude »o selecting among all the available 
evaluation appt oat lies and techniques. 

I'lu^niosi potential!} program constricting process for program 
evaluatioi^ that is a part ol the regular budgeting process, 
therefore. basi\clc menis, benefits, and deficits of principal budgeting 
approaches are examined simultaneously with their ability to fa 
cilitale a quality program evaluation. Partial performance budgets 
and zero-base budgets appear to he more supportive of sound pro- 
gram evaluation practice than are incremental or formula budgets. 
However, much more time and line stall involvement is required. 

In this report an investigation is also made of the multiple per- 
spectives on purpose and practice of program evaluation that exist 
according to institutional level: state legislative audit, review by state 
cooidinating board for higher education, multicampus scrutiny, 
campus program evaluation, and department study. Evaluations were 
initiallv begun to determine the need lor proposed programs and 
have spread to encompass most existing programs on a screening 
schedule if not an intensive basis. Another trend noted is that al- 
though eailv state level reviews were limited to quantitative factors, 
more recent reviews have included qualitative considerations as well. 
The use ol both kinds of data produces a more thorough, politically 
sound program evaluation. 

Considerable debate has taken place on the extent to which goals 
and objectives should be the locus of evaluations. Deficiencies in the 
usefulness of piogiam activity reviews has been related to a lack of 
precision existing in the statement and measurement of program out- 
comes. In response to such criticism, many schemata have been 
delineated for interrelating and valuing program goals and objectives. 
In this leport discussion is also focused on the content of organi- 
zational goals and how to avoid the trap of examining only stated 
goals. Those actions to be taken because of these concerns include: 
(I) looking for unintended outcomes; (2) determining with program 
personnel what are tealistic goals; and (3) measuring objectives at 
several points in the evaluation process. 

The report states that the principal purposes of evaluation can 
be said to be planning, imptovement. and justification. The phases 
ol an evaluation ptocess are seen to be foundation (establishing the 
scope ol the evaluation), inhumation gathering, and judgment. Par- 
ticipants in an\ evaluation process will be administrators and faculty 



of the unit under evaluation, other institutional constituents outside 
the unit bein K evaluated, consultants from outside the institution, or 
a combination ol any of these groups. 

Factors that influence the usefulness of evaluation reports, such as 
timing, comparisons to similar units, and format are discussed at 
length. Multiple measures ami diverse instruments are seen to im- 
prove the validity and reliability of a program evaluation. Also con- 
sulered important are political and ethical aspects of conducting an 
evaluation. ° 

Speculation about the future of program evaluations is made by 
examining present practices in diverse policy areas: nontraditional 
delivery systems, government revenue reduction schemes (taxpayer 
advocacy) , management (measures of administrative quality), regula- 
tion of professions (quality assurance) , and consumer protection (em- 
powcrment of the individual) . 
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What Evaluation Is and U Not 



What We Treasure vrr\u\ What We Measure 

In a comprehensive study of (he congruence between ideal goal 
preferences and goals actually observed by faculty, trustees, and ad- 
ministrators ot a geographically dispersed set of institutions of six 
diltercnt kinds (public doctoral-granting institutions, private doctoral* 
granting institution*, public comprehensive universities and colleges, 
private comprehensive universities and colleges, liberal arts colleges, 
and two-year colleges and institutions), Romney and Micek (1977) 
noted that most of the measures ol progress for goals not given a high 
priority by all three respondent groups are the data state agencies 
lequire institutions to collect. Some examples are the number of 
students enrolled, the number of full-time equivalent students, and 
grade -point averages. 

As a result of diminishing state resources and the severe questioning 
of all societal institutions that has taken place in the past decades, 
there is a compelling need to measure what is transpiring within and 
as a result of society's social and educational programs. The two prin- 
cipal approaches to such measurement, evaluation and research, are 
often confused with one another. A discussion of the similarities and 
differences of these two fundamental concepts follows. 

Evaluation veruts Research 

It is commonly stated that evaluation produces "worth labels" for 
some set of options within a decision situation. In contrast, it is said 
that research produces "truth claims" that are generalizable and serve 
as a basis tor theorv building (l'opham 1975). Woodrow Clark (1977) 
suggests that it is useful to compare research and evaluation by look- 
ing at one > intention and one or more barriers to accomplishing 
that intention. Within research we want to know something in a 
generaluable way. The barriers to knowing in that way arc twofold. 
First, the idea we want to understand has not been systematically 
examined before. Second, prior investigation of the ideas has been 
inconclusive. On the other hand, within evaluation we have the in- 
tention of making a choice between options. Barriers stem from the 
facts that the worth of the options is unclear and the information 
needs of the decision-maker are also unclear (Clark 1977, p. 9). 

The most thorough disc ussion of similarities and differences between 
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rcseard. am! . valuation ran bo found in Wortl.cn and Sanders (1973). 
I lay funis tlieir discussion on certain analytical elements- 
Stotwatum of thr /,m,vig„/,„.- Research ami evaluation are under- 
taken lor dtlloient teasons. Research is undertaken in response to 
curiosity about an id™. INc of collected infonnation is left to the 
natural processes of disseminalion and application (Weiss 1972 p 6) 
Hhuh ,s generally .he publication of results. In contrast, because 
evaluation .s uitendeil to aid in the resolution of a parti, ular kind of 
pragmatic problem, there is my intentional scheme for distributee 
the results to decision-makers. 

Obirruvt of thr Snnrh: Different outcomes are sought by research 
ami evaluation An evaluation studs will collect specific data wanted 
I»n a g,ven dec,s,on-m.iker. Thus the study is directly requested and 
supported bv that polie s -maker to produce decisions. In contrast onh 
die investigators „| research will select hypotheses and contexts in 
which to test Inpbthcsos. The final outcome will be conclusions about 
phenomena. 

Hoi, n( Thron:- Research is the search for laws about the relation- 
, ships aiming iwo or more ,/„w> of objects or activities, while evalua- 
lion desenhes tin- value characteristics of a specific thing 
Rolr of Ex t >h„atum: \ hilly adequate evaluation can describe the 
valuc(s) ol the- subjeci under scrutiny without providing an explana- 
tion of how the effects are generated. On the other hand, research is 
(omhuied to pinpoint cause aucMfcct relationships and trends 
Propnnrs »l thr lUmumnm: F.valnation is an activity designed to 
produce assessmeni „| wortli (social utility of a thing). Research is an 
auivm intended t„ a »e» scientific truth, which is characterized 
bv two principal properties: empirical vc, ifiabilitv and logical con- 
sistei. y. ° 

Grnnat,z„b,lifx „/ thr Phmomrnn: Evaluation can also be distin- 
finished from research b> the extent to which generalizations can be 
made about phenomena across time, across geography, and across 
tvpes of educational acmitv. Research is said to produce results that 
are lugldv geue.ali/.ible m all tluee ways. While product evaluation 
i> usualls generah/able with respect to geography, program evalua- 
"on has limited geueiali/ation with regard to all three elements 
(Worthen and Sanders l«)7:?. p, 27). 

Invr.ttwnvr Tr, h„„,ury I here has not been widespread agreement 
on the extern to whi, I, icsear.h and evaluation share investigative 
lechimpies. Seveial uiiieis have stated that comparative experimental 
des.gn. which is ., frecpient method of investigation within a research 
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«tudy, is not appropriate for an evaluation study (Carroll 1965; Croi- 
bach 1963; Guba and Stufflebeam 1968). On the opposite side of this 
issue, in addition to Worthen and Sanders (1973), those writers that 
have concluded that there is very little difference in investigative 
approaches are Stake and Denny (1969, p, 374), Weiss (1972, p. 6), 
and Hemphill (1969. p. 220). 

Worthen and Sanders (1973) speculated that a researcher can "get 
away with" using the tools of only one discipline, since he chooses 
the question that is under investigation and would rarely ask a ques- 
tion that forces him outside the discipline in which he was trained 
(p. 135). Since the evaluator has to answer questions selected and de- 
fined by someone else, the answers are not as likely to be found by 
using only one standard methodology. 

Criteria for Judging: Judging the adequacy of research and the ade- 
quacy of evaluation requires different criteria. Good research needs 
to have internal validity; that is, it should measure the effect of the 
variables under investigation rather than any extraneous influences. 
Good research should also possess external validity; its results should 
be applicable to other setting*. 

Primary evaluation criteria are isomorphism and credibility. The 
first criterion reters to the information that is gathered being in the 
same patterns as the information desired. The second criterion indi- 
cates that the information collected is believable to its users. 

Both Weiss (1972) and Suchman (1967) have provided muted com- 
parisons of evaluation and research by referring to "evaluation re- 
search/' Suchman compares evaluation and evaluation research by 
saving that the latter comes closer to proving worth rather than 
asserting it (p. 8). 

In summary, the major distinctions between evaluation and other 
types of research focus on intent and the existence of criteria against 
which one can make judgments. 



Similes and Models 



W.thin die liieraiure of the emerging field of evaluaiion ihere have 
been two levels ol concepts pin forth t o facilitate understanding On 
one level are similes that explain evaluation by comparing it to other 
more widely understood activities. 

Evaluation as Measurement 

The focus of this -nTpTtwth is on data and the formalized instru- 
nicnts used to collect .hem: frequent rctYrence is made to standardized 
scales Gardner sees the general process consisting of four steps: (1) 
identify aurihu.es ,o u measured: (2) design and .est an instrument; 
(S) use ms.ru.nen. under standard conditions; and (4) compare re- 
sults to a standard (1975. p. 576). 

Equating evaluation with measurement permits an evaluator to 
capitalize on the primary images of scientific measurement, those of 
reliability and objectivity. While measurement instruments produce 
data that are easilv manipulatable in.o norms and standards, concern 
has been expressed about scores becoming ends in themselves, ob- 
scuring judgments and imlgment criteria (Jemelka and Borich 1979 
p. 264). 

Failure of this model is likely if ,|, c entity to be evaluated does not 
possess significant measurable characteristics or the instrument used 
does no. adequately measure .he characteristic sought. 

Measurement specialists such as Thorndike and Hagen (1961) con- 
cede that true evaluation involves the judgment of worth that exceeds 
the collection of measurement data: 

The term evaluation a* wo „«• j| j, r |o*lv rc | aled |0 mca „ lremen| „ 
m umv rcspvtli m..re inclusive: iii(tndin K informal ami intuitive iudg 
menu . . . uwng what is desirable and good (p. 27). 

The chief advantage of evaluation as measurement is the produc- 
tJon of results that are comparable and replicable. On the negative 
side, the aspect of an entity that ran be measured may be peripheral 
to the objective sough.. 

Evaluation as Professional judgment 

Common examples of ihe use of "litis, evaluation model would be 
accrediting visitation teams, referees for journals, and peer review 
for awarding tenure or grams. 
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Criteria may be public. Methods include personal observation, 
interviews, tests, and review of documents. 

The advantages of this approach are: ease of implementation; die 
inclusion of inanv qualitative and quantitative variables: and the 
quickness of the results and conclusions. The major difficulties have 
to do with undesirable subjectivity. |M>tential unreliability, and cl if- 
ficultv of geneiali/ing t<> other programs. 

Pophatn suggests that there arc two types of professional judgment 
approaches: those based on intrinsic criteria and those based on ex- 
ternal criteria. To illustrate the difference between the two in his own 
way. Pophaui discusses the purchase of an electric drill. You could 
judge the drills on the basis of intrinsic characteristics: design, style, 
weight, and color (who wants an ugly elec tric drill?). You could also 
judge them on extrinsic factors such as how last or how neatly they 
drill holes (who wants a glamorous chill that won t dent butter?) 
(1975. p. 1 1). Accreditation is a major example of professional judg- 
ment using intrinsic criteria. 

Evaluation ax Aiir*stnrfit lleheeen Performance and Objectives 

Pophatn (1975) labels this approach a goal-attainment model. Gard- 
ner suggests that no other type of evaluation has received more atten- 
tion in recent higher education literature, encompassing as it does 
competence based education artel efforts to measure the program goal 
of equal opportunity for various siibpopulations and education for 
coherent careers (l!)75. pp. 577-8). 

Ralph Tsler (l!>50) is generally thought of as the father of the 
behavioral objectives movement. Fie advocated that the objectives of 
a program be spelled out in terms of specific student (client) behaviors. 
These behaviors are measured with either norm-referenced or cri- 
terion refetenceel tests. The formulation of goals emerges from an 
anahsis of three goal sources: the student, the society, and the sub- 
ject matter 

A mote recent goal attainment model has been proposed by Ham- 
mond. who discusses the nature of the* institutional and instructional 
factors that might be relevant in the degree to which stated objectives 
are attained (l!)7!*V The steps in Hammond s model include: isolating 
the portion of the c invent program to be evaluated: defining the 
|M k rtinent institutional and instructional variables: stating objectives 
in behavioral terms; assessing the behavior described in the objectives; 
and analyzing goal attainment results. 

Personal interac tion with progtam staff is a common research tech- 
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nu ue^f ih,s approach. So „ the examination of documents proce,, 
nd rea„o,„ „p s . Both Scriven (1967) and Stake (1967 emE 
the ear ly .movement ol evaluates to assist in defin ng objecTm 

ani o^KrLr. Mo imi0n ^ ^"ance 

pn^sT , t r t fOI ' ,,S,,,g . on , products rather than 

die M IcXml ' ■ m<,V r ,,mU ,,,c ^.looking of important side 
eitctts. Hc.,v, ,mphas,s ,s g iv ,n to products that are student behaviors. 

Evaluation as the Basis for Drnsiom 1 

d^o^J^'T • ,i,n8Ui,g< ' ,,aS bcC " Used bv P^ents of de- 
C ,,:m ° n e » '971: Provus 1971; Alkin 

i ). three I W >, ( ;,ssu..,pt.ons were observed by Floden and Weiner 
( Jb) as common elements to the individual works. F,m, Zk 
public y.proela.n.ed goals ; ,re the focal point for enacting p og^m. 
Second eva „;,„o„s collect information on the way n wh T2 

12 z rr' r ,,u ' drec,ivc,,ess of programs * 

ii , ' *«rnc«* «n»H-. it is stated chat decision- 

pr^„;:;!i„:: t r ,,ua,rt ' ,nfo,,na,ion - - b - *>< 

bv ° ln " »hi°D ll^T" ° riC v ted CVa,,,:Ui0n m ° t,el has been de veloped 

he • PP r ' ; ,|>,,a Na,lonal S,m 'y Co,n ' ni « ce o" Evaluation: 
the ( II P ((...mm. Input. Pro™, Product) model. The C1PP mode 

t ^ZT 1 r :iUm r aC,ivhies: < »> «"««' evaluation to htp 
w,y a «? ,lt,,t,,n, " ,P u ol >jec«ives: (2) input evaluation to clarify 
hu>s that resources ,„„ be allocated to achieve project goals- 3 

a on «T " ilMmg '""Mentation; and (4) product evalu- 

auon (0 dcternune if an activity should he continued rev ed e 

e, al 1971 p 129)' ° h,a,n,, » ^ PrOV " ,in * < S,uffleb «™ 

ev!iut» ,i0n ^ ? ,iS tVpe iS Vj " ml " 3 comini,aI change between 

LSJ^T^ r,u ' mc,,,ot, ° ,OKy of evah, * ion is h 

to ,,0,,,,iU,0n to provide informa- 

1971. p. £ P,,,g,i " n • " S>Ml "' ,,Clisi0m ( S,ufflcb enm et al. 
A similar decision-orieuted model has been advocated by the staff 
fie Center for ,he Stu.lv of Evaluation (CSL) a, the Un v sit oi 

UhUwnu Los Angeles (A.kiu 1969). There are five stage, o I CSE 
ode 11 he ,,i,ial s,, u , is n„ds assessment, which com ,! o „o«„g 

«l ,re,u, ,,,«K„n what is existing and what is desired. Thl 

principal foes ol this st;,g e i, p lo |,| em sdcctioni 



The second stage of the CSF. model is piogram planning, with its 
majot [(M ils on program selection. The third stage focuses on program 
modification. It is known as implementation evaluation and provides 
intoi uKitioii on tlie extent to which the program follows its own plan. 
The fourth stage, process evaluation, also has a focus on program 
modiluation. This stage gathers information about how well the 
objectives ol the piogiam are being met. In this way product* are 
examined emnute. I hr liii.il stage is outcome evaluation. This refers 
to the collection ol information about general worth of the program 
as reflected hv the outcomes it produces. This final stage focuses on 
program unification, mollification, elimination, or dissemination. 
The CSE model has been made more usable than several other ap- 
proaches !>\ extensive inscrvice training modules and workshops. 

The principal criticism of the decision oriented approach is that 
the evaluator accepts a decision context and values/criteria that have 
been defined bv the decisionmakers. Apple (1974) and Cooley and 
Lolines (197b) observe that no evidence has been shown that the de- 
cision in iker has anx more piofu icncy than the evaluator in these tasks 
of determining setting, program options, and priorities of worth. While 
Stufflcbeain et al (I 97 1) argue that an evaluator loses his objectivity 
(and usefulness^ bv participating jn decisionmaking, Scriven sees an 
individual as abiogating his evaluajtor role when he fails to participate 
(1967). ! 

Evaluation <i« a GoalFrre Process 

In an attempt to avoid bias that may exist because of the narrow 
range of piogram developers* prespecified goals, an evaluator looks 
for all outcomes of a program including unexpected effects. These 
outcomes are examined and summarized in a single ranking of social 
utility. 

Scriven brings 10 our attention methodological analogies of goal- 
free evaluation (I97'J. p. .1). In the field of aesthetics it is a common 
operating principle not to consider an artist's intentions in assessing 
a particular work of an. In philosophical ethics there has always been 
an argument between those who believe that the morality of an act 
is piimaiilv determined bv the motivation of the actor (he meant 
well) and those who would evaluate acts in terms of the consequences 
onlv. I he double blind design emploved in much scientific research 
is given as one further example. 

Mie consumer of the program s services is seen as the major audi- 
ence lot m>al free evaluation (House 1978). This approach is said to 
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be relentlessly comparative in nature (Seriven 1976), An evaluator may 
collect information relevant io program effects as they relate to ac- 
cepted societal norms or some other type of generally recognized 
standard (Gardner 197Srp. 584). 

Stake delineates a detailed process for conducting his form of goal- 
free evaluation: responsive evaluation, First, conferences are held with 
clients, stall, and audiences to identify program scope, discover pur- 
poses and concerns, conceptualize problems and identify data needs. 
Nexi, observers and judges examine selected antecedents, transactions, 
and outcomes. The third step is 10 prepare portrayals and case studies. 
Finally, reports are written and presentations are made (1975). 

The goalfree evaluator must be skilled at interpersonal relations 
because of continued communication with program personnel. The 
lack of emphasis on formal measurement methods is said by critics 
to make goal-free evaluation too subjective. 

Evaluation as Conflict Resolution 

The commissioning of an evaluation study can be a signal that the 
program under scrutiny is subject to negotiation and modification. 
Individuals with opposing views will see the evaluations as a battle 
of worth, the outcome of *vhich could determine a shift of program 
activities closer to their preferences (Floden and Weiner 1976, p. 8). 

The major approach identified by the author as being consonant 
with this simile of 'evaluation-as-conflict-resolution" is the judicial or 
adversary model of evaluation. 

During the past decade considerable interest has arisen in ad- 
versary evaluations (Rourilsky 1973; Levine 1974; Owens 1973). The 
chief format used is that of the jury trial. The stages of the judicial 
evaluation model described by Wolf are typical. First is the issue- 
generation stage, in which a variety of persons involved in or affected 
by the program identify a broad range of concerns. The second ttage, 
issue selection, centers on reducing the range of issues to a manage- 
able number of the hearing. Next is the stage for the preparation of 
arguments, which consists of collecting testimony and abstracting % 
relevant documents. The final stage is the hearing, with its presenta- 
tion of arguments and panel deliberation (Wolf 1975a; 1975b). 

Owens and Hiscox (1977) carried out six case studies of different 
uses of adversary evaluation and then compared them on the basis 
of purpose, format used, isMic identification and selection, data collec- 
tion for argument preparation, presentation, and decision-making. 
The* noted three important spinoff effects of adversary evaluation: 
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better communication between evaluates anil decision-makers; greater 
attention to formulating kev evaluation issues; anil increased concern 
tot inetaevaltiation (the evaluation of the evaluation that has been 
conducted) (p. 20), 

After participating in an adversary evaluation conducted by the 
Northwest Regional Iducatiotrl Laboralon on Hawaii's 3on-2 pro- 
gi am. Pophaui and Carlson had six uiticisius of the general adversary 
model; fallible judges: excessive confidence in the model's usefulness; 
disparity ol adversary abilitx: potentiality for manipulating a particu- 
lar result: excessive expense: and difficulty in framing the issue(s) 
(15)77) Both |aikson <M)77) and Thurston (1978) have observed that 
most of these criticisms applv to almost all other evaluation ap- 
proaches Both wi iters place great Uiith in the effect of the openness 
of the entire process. In at least one instance an outside anthropologist 
was emploved to observe and record notes on proceedings; a videotape 
of hearings has been produced in many instances to secure reactions 
to the process itself (Owens and Hiscox 1977, p. 22). 

In addition to the highly-valued characteristic of openness, a real 
plus is also found in the more active role that educational juries are 
encouraged to take This mav include questioning a witness and tak- 
ing written notes during the trial. The process speeds discussion of 
pressing issues ihat might lie debated in professional journals over a 
period of veais (Jackson 1977). 

On the other hand, among other unresolved problems of the ad- 
versary approach, the lollowing have been listed: how confidential 
jury deliberations should be. the best working relationship for a jury 
composed of expei t and nonexpert jurors; the size of a jury that is 
most efficient: the need for preexisting law (Denny 1970): when 
multiple hearings arc- jtiMilied: how "hard" data can be more effec- 
tively integiated with human testimony; and whether decisions must 
be made solclx on the basis of evidence presented at the hearing 
(Owens and Ilis<c>\ IM77^ Adversary evaluation is synthetic in allow- 
ing lor ihe picscutatiou and scrutiny of many different evaluation 
methodologies, 

Bexond the hequent use of a jury, other adversary evaluations have 
eniploxcd dehaics and contrasting position papers. Thurston (1978) 
has explained the potential Use of an apellate court model and an 
administiative hearing olluer. The administrative hearing provides 
nunc public disc ussion and displav than the appellate court, while 
I he Inter piodutrs ,i uiitieii moid, which is more useful in guiding 
Intuit decisionmaking (p. h). 
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Evaluation a.\ Complacency Reduction 

The very, act of p.u tic ipating in an evaluation may spur the con- 
suleration .,1 ideas by petitioners. Such partu tpation produces 
a < Unifying of piogiam goals and available alternatives; furtliermore. 
there mas be increased satisfaction with the administration and evalu- 
ation of the program. 

Despite the diverse purposes to be fulfilled by evaluation, a narrow 
range of (valuator skills (usually testing, survey analysis, proposal 
writing, and report piodurtion) has been favored thus far. To serve 
the role o. evaluation as c oullic t resohition or complacency reduction 
well, evaluators must have additional skills such as teasing out hidden 
goals and assumptions, training in mediation, and adeptness in inter- 
personal communication (Hodcn and Weiner 1974, p. 11). 

Evaluation civ Change Agent 

•Formally documenting and describing what is already part of the 
informal communication network can have a powerful impetus for 
change" (Smock I'lT.'i. p. \). If the evaluation is done by an "outsider" 
rather than an "insider." this documentation can have an even more 
profound effect. 

Evaluation as Ritual 

The conduct of an evaluation of a program produces a picture of 
government accountability and rationality that in turn promotes a 
feeling of security in uxpaxcrs. The commencing of an evaluation 
suggests that the- government is searching for improvements to its 
practices and solutions to existing problems. Also, evaluations mav 
simplify complex social problems into a choice between clear alter- 
natives. Smock (1975. p. h refers to evaluation as convention, standard 
approach, and liturgy. The ritual functions of evaluation are most 
strongly engendeied when no recognition is given to that particular 
role of evaluation. 

Formal Models fw Allegiance 

On a more foittial level are detailed, inulti step models which de- 
scribe the purpose and procedures for conducting a program evalua- 
tion. Classical ion of formal models of evaluation into experimental, 
ecological, and eclec tic approaches has been made by Mims (1978). 

Kxperiinental approac hes emerged Irom the natural sciences and' 
psychology. Thev are used for accountability and specific decision- 
making The chief models of this kind are the goal based experimental 
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and quasi expei internal models, which have been described by Camp 
bell and Stanley (I9tit>) and Pal ion (1978). 

Practical considerations thai diminish the application of experi- 
mental approaches to evaluation are discussed by Siufflcbeam et al. 
(197 1): laboratory research designs require conditions that are difficult 
to achieve in evaluation settings; an exaluator must remain as un- 
obtrusive as possible to program activities, rather than manipulating 
the environment as the experimenter does; experimental data col- 
lections occur at the conclusion of treatment and are of little value 
when correcting processes; ex|>erimenial control requires the use of 
only one treatment at a time, a practice which is not possible with 
clients who mav benefit from a treatment: and statistical techniques 
used with experimental procedures frequently offer restricted de- 
cision rules (a null hypothesis may be rejected or accepted, one 
treatment may be judged belter than another), which may be in 
adequate for complex evaluation based decisions. 

While experimental approaches are quantitative, deductive, and 
uniform in natme. ecological methods (emanating from the disciplines 
of sociology and anthropology) arc qualitative, inductive, interpretive, 
and diverse in nature. 

The principal use of ecological approaches is to increase under 
standing and improve programs. Major models of this type include 
illuminative (Parlett and Hamilton 1970) . transactional (Rippey 
1973), and responsive evaluation (Stake 1975). 

The major purposes ol the illuminative model are description and 
interpretation. The process has been described by Partlctt and Hamil- 
ton (197<>) as: investigators observe, investigators inquire further, and 
investigators seek to explain. The techniques most commonly used 
are observation, interviews, questionnaires, and document analysis. The 
problem under investigation is said to define the methods, not vice 
versa. The audiences for these te|>ort« are program participants, pro- 
gram sponsors, and interested outsiders. ' 

There are some problems with this approach. Its techniques are 
likely to be viewed .is generally subjective, which is related to their 
difficulty of replication and transfer to other settings: and there is a 
considerable need for rvaluators to have strong interpersonal skills. 

Responsive evaluation is said to come closest to democratic plural- 
ism (House 1978. p. II). Any group that becomes actively involved 
in the evaluation process will have its concerns represented in that 
evaluation. Procedures ate selec ted to fit the issues of interest. 

Stake advocates the collection of descriptive and judgmental data 
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Contextual Coaxing and Control 



Evocation of Evaluation by Society 

Those ( oiulit iotis identified by Peterson as prcdis|K>sing factors to 
the iiiiplciiiciii;iti(»n ol state program review activities include: di- 
minishing fiiiain ial tcsouiccs; da lining public confidence in state 
government; state olluiaN stressing of accountability; and budget in- 
novations by higher education agencies and institutions (1977, p. 10). 
Hill, Luttcrbie, and Stafford identified additional factors as: enroll* 
ment dec lines; the push lor expansion in high-cost programs such as 
health related piolcssioiis; the slowdown in federal support; and the 
realigumeiit of progiams to meet standards set by the U.S. Depart- 
ment, of Health. Kdiuation, and Welfare lor an integrated system of 
higher education (1979. p I). 

' The tinning point lor plans to become programs is the authorizing 
t wheal budget. \n\ decisions to begin programs, modify their scope, 
or discontinue them (the major put poses of program evaluation) are 
most dirertlx dec hired in the budgetary process and content. 

Evaluation /*j liuri^cting Approaches 

After noting that a government budget serves three functions (con- 
trols spending, enables management of activities. ;ind determines ob- 
jectives) . Schick suggested that the budgeting approach favored at a 
given time paiallcls the beliel that one of these three purposes is 
being emphasized at die expense of the others (1971, p. 3). The first 
period ol budgei innovation recurred from about 1910 to 1935. This 
executive budget campaign was intended to einphasi/e control so as 
to prevent waste and corruption. Incieniental budgeting is a common 
leminder todax of that period. 

The next era was that of performance budgeting, lasting until the 
late l%0s. 1 he locus was on good management to achieve stated 
goals. Formula budgeting is one widespread artifact of this era. Re- 
cent interest has been shown in the use of performance budgeting for 
a portion of an institution's tola 1 budget. 

The thiid budgeting movement foc uses on planning. The develop- 
ment of planning, programming, and budgeting systems (PPBS) at 
the lecleial level represents the first event in the shift to planning 
budgeting. After use lor seveial ye ars, a number of states joined the 
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J 1 ' Kovernmau in concluding that the time had not come for 
IMIV One diffiudtv was that I'I'BS became a parallel and com 
pet.t.ve process to the traditional budgeting approach, rather than 
suppla .mug ,t (Can.thers and Orwig 1979, p. 50). Zero base budget 
«ng is the latest approach to have an effect on the use ol program 
evaluations. 1 * 

Incremental Budgrttng 

Incremental budgeting is thought to be the most commonly used 
budgeting approa, |, j„ |,i Kher education today (Adams, Hawkins, and 
Scluoedc. 978. p. 51). Caruthers and ()rwi K see the incremental ap- 
proach to budgeting as requiring the le;.st work and analysis, while 
causing the least political conllict. It provides the least information 
conccrmng whether the budgetary decisions support institutional goals 
(p. 38). Hence, it has minimal usefulness in program evaluation. 

L.ngenfelter (1974) examined the operating budget requests, gov- 
ernors recommendations, and final appropriations for higher educa- 
lion in Illinois. Michigan, and Wisconsin for the period of 190S- 
U/4. He also interviewed more than 80 people in diverse decision- 
making roles before concluding that incremental budgeting models 
worked extremely well to explain appropriations outcomes. 

In . contrast, Bailey and O Connor reanalyzed case studies collected 
by U.ldavsky (19(H). Fenno (1966). and Sharkansky (I9fi8) to 
demonstrate the prevalence of incremental ism (1975). The surprising 
nndings of the rcanahsis was the significant number of instances of 
noimicitmenial changes in annual output at the federal ami state 
level. 

Formula Budgeting 

In more ,o,„isc ,haracteri/ation. Mcisinger savs that a formula 
budget is a combination of technical judgments and political agree- 
meiils (1970. p. L>). This pair of elements is also found in program 
evaluations. Advantages and disadvantages of formula budgets can 
also apply to program evaluations. 

These advantages have been attributed to formula budgeting: ease 
ol preparation and understanding; clear use ol financial incentives to 
support statewide priorities; equitable treatment of institutions (Ca- 
ruthers ami Orwig 1979. p. IS) , preventing rich institutions from get- 
tmg dispi optionally richer (Moss and Gaidar 1970. p. 553) ; pre- 
diction of future resource allocations; and making sure that higher 
education receives its share of total state resources, based on need 
and objective requests (Moss and Gaitltcr 1970. p. 553) . 
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Disadvantages noted for the me of formulas in budgeting include: 
failure to provide start-up costs for new or innovative programs 
(Meeth 1975); failure to react quickly to rapidly shifting price 
changes (Moss and (iaither 1976, p. 558); failure to measure more 
than one level of quality (leveling) ; and lack of flexibility to handle 
the complexities of enrollment decline (no consideration for econo 
mies of scale) . 

After concluding that 25 of the 50 states used budget formulas in 
allocating funds to higher education, a study by the Michigan Depart* 
ment of td notion noted that if one considers quantitative guidelines 
as well as formulas in use by state governments, almost all state 
budgeting processes employ quantitative measures (1976, p. 15). 
Glenny et al. (1975, p. 46) refer to nonformula states as indicator 
states, where indicators are used to analyie but not generate budget 
requests. 

Formula budgeting is not seen as being replaced in the future, only 
slightly altered in content to account for fixed costs not reduced by 
declining enrollments (Moss and Gnither 1976, p. 560; Caruthers and 
Orwig 1979, p. 45). The inclusion of qualitative elements is also 
seen for formula budgeting. 

Partial Performance Funding 

The Tennessee Performance Funding Project (TPFP)»is an ef- 
tort to improve the state's formula budgeting and appropriations pro- 
cess. It explores the question of whether it might be desirable and 
possible to allocate some portion of state funds to colleges and uni- 
versities on a performance criterion (How good?) as compared to the 
current credit-hour and enrollment criteria (How much?) (Bogue 
1976. p. 3). The underlying assumption has been that even imperfect 
measures, wisely chosen, may operationally improve the allocation 
process (p. 11). 

The first phase of the TPFP consisted of involving national and 
state authorities in clarifying the conceptual base of the project, 
identifying related efforts underway around the country, outlining 
procedures for executing the project, and obtaining the necessary 
support for pilot projects (Bogue and Troutt 1977a, p. 4) . The more 
widely used outcomes of this phase have been a set of hypothetical 
examples illustrating various approaches to performance funding 
(Bogue and Troutt 1977b) and a delineation of graduate competencies 
(Tennessee Higher Education Commission 1977). 

Any modification in the funding policy was viewed as needing to 
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Several concents remain jhuui performance funding: the focusing of 
attention on indicators (the means) rather than goals (the ends); 
the uniform interpretation ol performance data without regard to con- 
crrns such as reliability a/id validity; and the misleading comparisons 
of performance on simihn or common indicators for institutions with 
differing missions, resources, and clients (Dumont 1978b, pp. 9-10) 

Zero-Base Budgeting 

In advocating the use of the current form of planning budgets, 
zero base budgeting (ZBB), Pyhrr claims: that PPB focuses on what 
will be done, not on how to do it: and that PPB does not provide an 
operating tool lor line managers who implement the policy and pro- 
gram decision (1973. p. 149). 

Schick proposes that zero-base budgeting consists of three ele- 
ments. First are the decision units of an organization, which have 
much to do with defining objectives and instituting sets of activities 
to accomplish stated objectives. Second, decision packages represent 
those sets of activities ihat are combined for the attainment of one or 
more objectives. Third, the managers of decision units and other 
higher-level administrators rank the importance of the various decision 
packages (1977). Thus. ZBB can induce integrated program evalua- 
tions at several levels. 

Since zerohase budgeting was first used on the state-government 
level in (ieorgia, the observations of objective, yet close-at-hand re- 
st archers are worth consideration. Minmier and Hermanson (1976) 
surveseel budget analysts, then conducted follow-up interviews with se- 
lected analysts and department heads. While the majority of respon- 
dents felt that the cjuality of management information gathered under 
ZBB had improved, they did not believe that there was a significant 
reallocation of the state s financial expenditures. However, it did in- 
volve more line administrators in the budgeting process than earlier 
budgeting methods. Nevertheless, the first year of ZBB took sub- 
stantially mote time and elfort than previous budget preparation. 

Fincher noted that there appears to be no evidence that ZBB re- 
sults in mure clearls established goals or better measures of the 
progress toward stated goals (1978). 

Other lesearcheis have examined i lie implementation of ZBB In 
various locations to find problems and solutions that will generalize to 
subsequent adoptions and adaptations. Scheiring looked at the first- 
sear use ot ZBB in New Jersey and concluded that ZBB cannot be 
implemented overnight (1976). No? only do the proper forms have 
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Program Evaluation at Different Organizational Levels 



Paiton (1978) observed that there is no one effective strategy of 
evaluation in the abstract, separate from the organizational context 
in which it is introduced or the information using capabilities of peo- 
ple employing it ^p ' tr » The key questions to be answered are Why? 
Who? and How? Minis focused on four broad purposes of program re- 
view: context, input, implementation, and outcome. Determining the 
nml for a new program is a primary example of a context review. An 
input review collects suggestions on how a program should operate. 
An implementation (or management) review examines whether the 
program is being conducted as planned and with what degree of ef- 
fectiveness and efficiency An outcome (or impact) review not only 
asks Did it work? but probes any unanticipated outcomes and why they 
occurred (1 978) . 

The possibi lilies lor who should conduct a program evaluation 
could include: program personnel (a self review) ; external reviewers 
within the institution (other faculty and other administrators); exter- 
nal reviewers outside the institution (accreditation teams, disciplinary 
specialists, professional evaluators) ; and multiple or mixed-group re- 
viewers. 

Beca ise there ate different participants and procedures used at the 
various organizational levels at which a program review may be com- 
missioned (state legishuuie or governor, state coordinating board, 
muhicampus s\stem office, campus or department), it is vital that each 
of these perspectives of the program evaluation process be examined. 

Legislative Reviews 

"flerdahl (1977) related details about the shift of the post-audit 
function lor state piogiams horn its early executive branch or inde- 
pendent status to a legislatively affiliated audit in over half the stales 
(56 itt 1975V Several states (California, Pennsylvania, and Wis- 
consin) have both executive and legislative auditors. 

Pethel and Brown (1971) drew a distinction among three types of 
legislative audits or reviews, A financial audit examines whether money 
was spent according to legal procedures and for its allocated purpose. 
A management audit examines efficiency, the amount of resources 
needed to attain a particular program objective. Finally, a performance 
audit looks at the extent to which a program objective is met (ef 
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fetttveness) . For example, if ihe objective is to provide training for a 
particular population, its efficiency is that proportion of the popula- 
tion that received such training or that amount of the training 
activities individuals had an opportunity to complete. Performance 
audits are now mandated by sunset laws in over half of the states. 

Sunset Laws 

Sunset legislation was first conceived in 1975 and is now law in 
more than 2!) states (Sherman 1978, p. 1). At least twelve states 
fouis on regulatory activities. Nine state laws add advisory bodies 
and departments to the regulatory scope of sunset coverage. Seven 
states encompass all government agencies in their sunset laws. 

Nineteen of the state laws require that preliminary reports must 
be written by existing government units. These reports will vary in 
quality according to the staff doing the reports and their objectivity. 
North Carolina established an independentlystaffed commission to 
do the preliminary evaluations. How»ver, most states use legislative 
audit, fiscal research, or substantive committee staff. 

Although there has been considerable difficulty in collecting rele- 
vant data, sunset review has prompted agencies to organize informa- 
lion better and establish new data collection systems for future re- 
views (p. 18). The evaluation reports vary considerably in depth, 
ranging from New Mexico's 50-page report on 19 boards to a 
HO-page report in Tennessee on the Board of Accountancy. 

Efforts to determine the costs of sunset evaluation studies have 
been unproductive because a considerable proportion of the expenses 
reflected start up costs, and the early ..nits that were reviewed had not 
been subject to too much legislative scrutiny previously. 

At least two important lessons have been learned by evaluation 
staff about how to improve the sunset review process. First, all inter- 
csted parties can be kept informed from the" beginning of the review 
so that the adversarial nature of the process is minimized. Second, 
recommendations tan he carefully offered so thai they will apply to 
other agencies in addition to the one unit under scrutiny. Generally, 
there has been a Rood record of the acceptance of recommendations 
(p. 25). 

Sunset reviews offer great potential for facilitating a dialog among 
legislators, administrators, and citizens. Ihe process can realign the 
power position of the executive and legislative branches of govern- 
ment and can certainly increase the information available to decision- 
makers and the public 
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Case Studies of Legislative Audits 

Berdahl (1977) presents a study of die Wisconsin Legislative Audit 
Bureau intending with die central administration of the University 
of Wisconsin over die request of die governor to prepare procedures 
to operate with reduced resources. Also examined by Berdahl is the 
1973 evaluation of the Virginia College System by the Joint Legisla- 
tive Audit and Review Commission. 

Legislative Assistance 

Since the objectives in higher education are not always known, are 
often in conflict with one another, and sometimes cannot be agreed 
on, the state level review must be concerned with what ought to be 
done as much as with how to do it (Halstead 1974, p. 654). 

A series of grants have enabled the Eagleton Institute of Politics 
to provide training and technical assistance to legislative staffs so 
that thev can establish and/or improve program review processes. 
Such funding began in 1971 with a focus in six states on the legisla- 
tive oversight of education. The most current grant supports the 
efforts of eight states in education and social sciences. One of the 
more important outcomes of these grants was a planning and imple- 
menting guidebook prepared by Murphy (1976). 

State l evel Hither Education Agencies 

Dougherty (1979a) described statewide reviews as a mechanism for 
providing local institutions with state, regional, or hational perspec- 
tives (p, 11). 

In his first comprehensive survey of state-level academic program 
reviews bv higher education agencies, Barak (1975) looked at policies 
and procedures lor examining new/expanded programs and existing 
programs. Seven major criteria were included in the state coordinating 
agencv |>olic to for approving new programs: program description, 
purposes and objectives, needs analysis, cost analysis, resource analysis, 
ptogram accreditation, and availability of adequate student Financial 
aid (p. 5). Barak also determined the extent to which states actually 
used quantitative criteria recommended by the Task Force on Coordi- 
nation, Covet nance, and Structure of Postsecondary Education of the 
Keltic ation Commission o( the States (1973) to guide program dis- 
continuance. These criteria included: number of graduates in each 
of the past five years; number of students enrolled; size of classes; cost 
|kt piogra^i gt actuate, faculty workload: progtam quality as reflected 
bv region it or national reputation: production of graduates from 
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similar programs in the Mate, region, or nation; economies and im- 
provements in quality to he achieved hy consolidation or elimina- 
lion; student interest and demand; and appropriateness to institu- 
tional mission (p. IS). 

Barak said that it is common practice for institutional reviews to 
focus on qualitative criteria, while state-level analysis gives almost ex- 
clusive attention to quantitame data. There is generally a two-phase 
process that occurs. In the first phase a screening process is used to 
idennly programs that are of questionable need, productivity, quality, 
or other crite ria (hank and hY.dahl M*7T) . A more intensive review 
•s then conducted in phase two on the programs identified in phase 
one Peterson (IU77) noted that most states are now talking about 
qualitative as well as quantit.uisc outcome measures (p. 3). 

Berdahl (11*77) clcsciihcd the diffci ernes between two types of per- 
lormar.ee audits: limited ,,,d intensive. He indicated that only a small 
■•umber ol states enj-ayc in intensive reviews. He provided the ex- 
ample of a comprehensive performance audit in Idaho that became so 
time-consuming ,| !; ,t it has neve, been repeated on that scale again 
but rather has been leplaced hy manage.ncin audits. The latter ac- 
liyitv represents the kind of process. Hcrdahl noted, that has been 
advocated by the- IS Cmrrnmciit Accounting Office and others when 
faced with hunted icsomccv evaluating the evaluations already beinc 
conducted by others. 

Florida: A Case Study 

A process lor the swcmwule leview of all programs at graduate and 
undeiKiaduate levels l„. seleciecl disciplines at the nine state uni- 
versuies in Florida was bei»un hy the Hoard of Hegents in 1972. After 
having placed Iti graduate programs on probation for low pro- 
dmtivits ... 197.5. the Hoard of Rebuts added IOC. more programs to its 
probation list in 1971 and 39 more in 1975 (Florida Board of 
Regents n.d.. p. S|). During this tluee-sear probationary period the 
imijonts of academic progiatus either became more productive or were 
metged into. similar decree programs. Flcvc. of the probationary pro- 
grams were terminated by tl.e university involved. 

The kinds of data that were examined included the number of stu- 
dents enrolled. ,!„• mm \, vt f)l Mlulnits lt . cclvillR dcRrec , admissions 
standards, lacults cpialilications. curriculum and course offerings, 
budget. laciliiicN. ccpiipiuent. contracts ami grants, library holdings, 
and placement ol students (Hill. Lutierbie, and Stafford 1979. p. 2). 

Fvteinal cousult.mts made 'site \ isits to campuses to gather data or 
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assess the scope of data tint had already been collected by institutions 
themselves. The consultant* were selected primarily from recommen- 
dations made b\ the faculties in the discipline under review, univer- 
sity administrators, professional organizations, and accrediting 
agencies, In addition to their own perspective of the discipline, they 
followed a set ol guideline questions provided by the Hoard of Regent? 
office encompassing the major categories of program quality, program 
priority cateer implications, program administration and manage- 
ment, and articulation with other programs (Hill, Luttcrbie, and 
Stallord 1979. p. $) . 

Multicampus Reviews 

In their initial study of multicampus systems, Lee and Bqwen 
(1971) observed that reviews were conducted on individual cat*. puses 
for pro|H)sed and new programs. Systetnwidc considerations were sel- 
dom included within such reviews. By their second study in 1975, the 
authors noted that systemwide reviews were a common practice for 
new programs, taking into consideration mission and academic qual- 
ity. Also, in seven of the nine multicampus systems periodic reviews 
had been established tor existing graduate and professional programs. 

t hat actions by individual institutions are crucial is easily seen 
within Knat son's obscnation that it is anything but self-evident that 
state decision v as opposed to local decisions, arc made with a higher 
order of rationality and a clear adherence to the public interest 
(1975) 

Institutional Reviews 

Dougherty (1979) visited one private and nine public research uni- 
versities that had reviewed and closed at least one program or that had 
undergone a so ions financial crisis. He interviewed key administrators 
and fautltv examined written documents, and talked with appropri- 
ate state-agency representatives. Dougherty found that the authority 
to review programs existed at every possible level (departments, 
schools, total university or coordinating agency) in one university or 
another. Although a certain level may have the formal authority 
to e\iew programs, it may lie decentralized. For example, the co- 
ordinating and governing state agencies for higher education in 
Minnesota and Wisconsin, respectively, have allowed the main cam- 
puses ol the primary state univetsity to carry out program reviews 
(p. 8). 

Dougherty relates (lit- defuicneies that be finds in the various in- 
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Miiuiioiul prolan, review processes ,o needs that can be met by 
umd.ty mst.tut.ona. research (.979a) . Financial a„a« ysis J ,.1,7 

2 L S <WS, V , P, ° Kram tM9 * Vtry V« -Idem 

l« »t. (),, > one ol the ten institutions studied had engaged in this 

•« .v.tv. the ,, m ., t , llIlivmit> A (| research 8 , tiv t 

Ucnmmng marginal costs when students are shifted irorn a closed 
"gran, to another continuing , M -„j. laln ; „ thc ( . o| , A , . , 

(>l <**>nh would he «„ c«, down the time required to 

com, . etc program evaluations. I he primary example givent poi t 
the potenfal magnitude of this program is 'the review of al 

p e e Bv til J ^ V ^ on ^y ^cation to com' 
plete By the „me the rev.ew was concluded many program char- 
•utenstics and pet sound had changed. 
A fourth uw of in s ,i, llt i„na| researc h is ,„ generate comparable data, 
o . stance, a sta„s„c o. l;uuItv p t(M , utliv i,y (which is client con- 
tact) has great variation across disciplines. 
While noting tha, ,|,, approaches discussed by Sprenger and Schultz 

c L ; n , Aup (,,j7r,) ,,,ovit,t " i**'^ <° 

.« mpuse lacug reductions. Shirley and Volkwein found little in- 

\^n y lT! lo ' a,,io,,s {H ' 78 - l7,) - T,le y s y n "'«^ ™"> 

».«. appioachcs „> p,og,am assessment to derive a process for match- 
mg important elements o. institutional mission; external need, op. 

T l * 0m,ra,,m ; a,Ul in,Cr " aI s,rcn « ths ™* "pabi.ities. 

achate c on.pai .sons, thev selected evaluative criteria for quality 

en rdi tv' 1 - ^ fari,ili « ™» ^uipment?; need 

centra „y to mission, present stude.ft demand, projected student 
dnnand. demand for graduates, locational ac.vantag S ) nd cost 
%*:T? " ra,i,,K S<:,U '- Tl - -"M-LL place pr" 

5 ' u, n ' eRtH,t,S ° Vrra,,: (,) Pr ° gran,S lo bc ~ntinuedat 
•t .mm,, hse ol ac.v.tv .egarding resou.ee level, enrollment, and 

o. lacHhv; (S) pro-ams ,„ ., continued at a reduced eve 
>' •'«- > and resources: ,1, p l0? , l;mis in ^ « 

oped lunhe,; (1) |)10Kral „ s „ow in existence that will he phased 
out. ami (.„ new programs to he developed (p. 178) . 

A siuyey ol 195 innovative institutions was conducted in 1975 by the 
•enter .or Research and Develop,.,,,,, in Higher Education at he 

"7 ,w ," ,0,,,U ' ,U,kd< >' » *** « of evaluation 

P acnes (Hodglunson. Hurst. Uvine 1975). The response rate was 
<<• |htco„, institutions). Onlv slightly less than oncthird of the 
■nsntufon* had an i.,s,i,u,ion,vide committee on evaluation. In the 
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process of detei mining which standardized instruments were used in 
any of four evaluation areas (student characteristics, environmental 
measures, institutional goals, ami course evaluation^, tliere was a 
strong preference lot locally developed instrutnents noted (p. 4). The 
analysis of purposes to which dillerent types of institutions have put 
evaluation data showed no significant cliileretucs by type of control or 
level of highest ollering. When asked about the most important 
evaluation problem with which they had to deal, 72 percent indicated 
that it was* di let tuning the effectiveness ol new or existing programs 

(p. »)• 

Several teseaichets have examined methods that have been em- 
ployed by public research universities to review their existing, pro- 
grams (Braskatnp 197*); Dougherty fi»79b: Hall 1978; Shirley and 
Volkwein 1978). Regular cominiitees or special task forces are the 
usual vehicles lot conducting such reviews. Hall concluded that the 
principal benefits of the program reviews were increased information 
lor participants and additional scrutiny of administrative decisions 
(1978). 

Wood and Davis (1978) also summarize common methods of 
evaluating existing auricula: analyses of student transcripts, test of 
academic competent ies, competency based education of entire in- 
stitutions, coinpiehensive examinations, examples ol student work, 
institutional sell studs instruments, surveys of current and former 
students, sur\e\s ol lacultv. and program reviews. 

At the same time that institutions and departments within them 
are subject to inaiul »*orv program reviews by state-level offices and 
agencies, thev ate also "voluntary' 1 participants in program evaluations 
conducted bv visitation teams from regional and professional accredita- 
tion associations. 

Accrediting Reviews 

Irom a content analysis of the published criteria for the six re- 
gional accrediting associations, Troutt (1979) identified five common 
ciheria that were claimed by the accrediting groups to have some 
lelatioitship with quality assurance: institutional purposes and ob- 
jectives, educational programs, financial resources, faculty, and the 
library learning center. He could find no solid pattern or results 
from reseat ch studies to confirm or dens the claim that any of these 
i hiit tictet istic > icptescnts a measure of institutional quality. 

I he extensive leport bv Harold Orlans et al. (1975) on private 
accreditation and public eligibility, ends with the conclusion that 



neither private accrediting agencies nor government authorities are 
able and/or willing to control consumer fraud in education. 

The Carnegie Council on Policy Studies in Higher Education (1979, 
p. 63) has recommended many changes in the practices of regional 
accrediting associations including: increasing the number of trained 
full time staff to assist visiting teams, publishing periodically a report 
on the status of all schools that are members or have applied for 
membership, anil making public final institutional evaluation reports. 

Departmental Reviews 

M. Clark (1977) surveyed 150 diverse and representative depart- 
ments in colleges and universities on their program review practices. 
She found that 60 percent of the departments reviewed both their 
undergraduate anil graduate- programs. Objective statistics on various 
departmental characteristics (such as faculty training, experience, and 
publishing: number of degrees awarded per program; physical and 
financial resources; and student enrollment) were collected by 80 
percent of the departments for internal use, which was less often than 
die same information was collected for outsiders. More subjective 
information (such as student evaluations of courses and teaching, 
faculty and student rating, of the departmental learning environment, 
and student judgments about their educational experiences) were col- 
lected more frequently for departmental use than for external use 
(Clark 1979, p. 2). 

After demonstrating limitations of peer reviews for determining 
the quality of graduate departments. Clark (1974) shows the receptivitv 
of graduate deans to a variety of multiple measures of quality. An 
exploratory studs using departments of chemistry, history, and psy- 
chology at 2'} universities across the country demonstrated that reliable 
judgments could be made from student, faculty, and alumni re- 
sponses about program aitiyitics. procedures, and the learning en- 
vironment (Clark. H.ntnett. and Baird 1976). 

A thorough description of the historical development and present 
implementation of the process for evaluating academic departments 
at the Hrbana campus of the University of Illinois is provided by 
Smock and Hake (1977) The authors maintained that this systematic 
evaluation process clillers from those elsewhere in that: (1) it is built 
on an extensive foundation of thought and planning going back a 
number of ye ns. „iil it is lac ultv based and largely a sclf-cvaluative 
elloit (p |y The coordinating group for this evaluation process is 
the Council on Progtam Evaluation (COPE), which is headed bv an 
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Associate Vire Chancellor for Planning and Evaluation. A narrative 
report is written h\ family anil administrators in the department 
following guidelines established by COPE. Discussed within the re- 
port are questions about the view of the discipline nationally, faculty 
research and service activities, o|>eiational procedures, and present 
problems. In addition, statistical data about tenure, promotion, courses 
•taught and budget are provided by central university offices for in- 
clusion in the iv|xm. I'he narrative report is submitted to COPE as 
raw data and is considered confidential. After discussion, COPE pro' 
duces a public action report containing a summary of the self-evalua- 
tion, the recommendations made by COPE itself, and the reaction of 
the department to those recommendations released (p. 7). Since COPE 
sends its lecommendations through normal administrative channels, 
its ideas get interpreted bv individuals whose job it is to stay attuned 
to such external demands on the institution (p. 8). 

Pou I ton (15)78. p. 9) discusses the ways in which the usefulness of 
information provided bv program reviews and the kinds of actions 
that arc* influenced bv program reviews are different for various 
organizational levels: t lie department or program, the school or col- 
lege, and the central administration. Single-unit reviews have the 
greatest usefulness for the unit examined. Benefits include improved 
procedures, clarified goals, improved internal communication, and 
improved rationale for resources. Problem areas are more readily 
recognized and approached in a rational manner. 

At the college level, the information gathered from the review up- 
dates the existing knowledge of college staff, and permits defining 
current trends in the discipline. Ccncrali/ations about problems com- 
mon to se\eral units will slowlv emerge. Reviews can be the basis for 
deans to teallocate funds and facultv. 

I'he cential administration receives the least direct benefit from 
a single piogiam renew. In rare instances it may result in major 
organizational changes or budget cuts. INuallv it will allow top ad- 
miiiistiatois an oppurtuniu to observe the health of an academic 
unit as it res|>otids to t Ik* review process and the recommendations 
at isiug fiom it. 

Spn utahtttis h\ ()y\r I rvrl of Itrvinr About Another 

It is impoitaut to bear in mind the different values and procedures 
that aie imoKcd within program evaluation at different levels of 
oiganizatiou In obtain state and local agencies' views on federal 
educ at jot si I progi .nu e\ ablations, the General Accounting Office 

10 




(Comptroller (General 1977) sent questionnaires to state education 
agencies and a .statistical sample of local school districts throughout 
the nation. Although slate officials viewed federal managers as being 
most impiessed hy standardized noini referenced test results, and local 
officials viewed Mate and tedcial officials in the same manner, state and 
local officials said that thev are not most impressed hy such results 
(p. v); in contrast, state officials said that they were most impressed 
hy results from criterion-referenced tests, while local officials said that 
improvements in curriuilum and gains in the affective domain were 
most valued. 

Goals 

Fincher (1978. p. 1) ohsencd that institutions find it necessary to 
take an inventor\ of goals and objectives when there is a loss of sus- 
tained momentum or a failing sense of direction. As already discussed 
in an earlier duplet, institutions of Ingher education have been sub- 
jected to rigorous questioning of purpose during the past decade. The 
response has been the creation and use of schemata of goals and ob- 
jectives such as developed by Peterson and Uhl (1977), Gross and 
Grambsch (1974), and Lenning ct al. 1977. Such schemata are being 
used increasing^ in program evaluations. 

Conrad (1971) declared that in most universities, goals are often 
implicit, residing in an extended body of collective understanding 
rather than in explicit statements. Romney and Micek (1977) de- 
scribe efforts to translate goals into measureablc objectives. Tbey note 
that a major diflicuhv in making this translation is the identification 
and agreement on pieces of e\idcnce that demonstrate progress to- 
ward the achievement of established goals As an example, Institu- 
tional Goal Inventory (IGI). an instrument developed by tducational 
Testing Service (Peterson and Uhl 1977). was used to compare the 
existing and ideal uoals of facility, trustees, and administrators at 
15 colleges representing six major categories of institutions. The 
chief conclusion was that veiv little actual measurement of outcomes 
that the dilfeicnt group** ult ought to be measured took place. 

Outcomes 

(tuning (1977) provides a thorough analysis of extent configura- 
tions for measuiing the outcomes of postsecoudary education. He 
summarizes measiues that affect individuals, institutions, society in 
general, and these gioups simultaneously. He lists six attributes of an 
outcome. First \s loim: an outcome must be a product, an event, or 
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a condition. Second is the change status: whether the outcome pre- 
serve!* or alters the status quo of relationships and/or conditions. The 
essence of what is changed or maintained, known as focus, is the third 
attribute. The value neutrality of an outcome is the fourth attribute. 
The ease of measurement is the fifth attribute, and the final attribute 
is duration. 

Five other tailors boost understanding of outcomes. They are: 
which functional unit of the institution produces the outcomes; for 
whom the Isenefic is intended and who actually receives it; whether 
the outcome was intended: when and where the outcome occurred 

(p. 20) , 

After careful work in building a comprehensive outcomes structure, 
N CM EMS developed a series of standardized questionnaires to collect 
information from two w\it and lour year college students in five cate- 
gories: those just entering, those re enrolling, those leaving without 
completing a program, those graduating or completing a program, and 
recent alumni (Cray et al. 1979). These instruments aro already focal 
points for many program evaluations. 

Social Indicators 

The Department of Education in Oregon has a goal of setting its 
course of activities based on empirically verified needs. Without suf- 
ficient funds to expand student assessment efforts to gather that data, 
I he use of indicators has been considered the most reasonable alter- 
native (Impara 1977). Clemmer (1977) discussed the conceptual social 
indicator model that has been derived for the Oregon setting. Several 
types of indicators have been included in the Oregon approach: input, 
context, output (performance), and societal (side-effect). Based on a 
review of the literature of social indicators and analyses of the pro- 
|M>scd use of indicators b> the Oregon Department of Education, 
Jaegar suggested certain criteria for the selection or development of 
indicators. The\ should be: expressed in quantitative terms; time- 
referenced: directly and demonstrably related to a statewide goal; in- 
put indicators or context indicators, demonstrably related to at least 
one peiformancc indicator or societal indicator; and accompanied by 
estimated measurement error for user enlightenment (laegar 1977, 
p. 22). 

Howe\cr. Jaegar declared the principal contributors to this area 
ol research seem almost oblivious to problems of psychometric ade- 
ipucy. although he does credit Land and Spilerman (1975) and De- 
Ncnfville (1975) with considering construct validity problems briefly 
(Jaegar 1977. p. (>) 
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Van Alstyne (I97R) provides us with a persj>ective for seeing the 
use of social indhatois at insiiimional and total system levels of post- 
secondary education. She notes that diversity and access are meaning* 
ful only'when applied to the whole system of opportunities (p. 460). 

Rossi and Gihuartin (1977) speculate that the future of social in- 
dicators will be characterized by a much wider group of data con- 
sinners. This is because the average person in future societies will 
have greater numerical abilities than is presently true. Another reason 
is that more government piograms will require urogram evaluations. 

Not Getting Trapped by Goals 

Weiis has concisely stated that too much attention to the goals of a 
% program can diminish the impact of an evaluation: among the many 
reasons for the* negative pall of evaluation results is that studies have 
accepted bloated promises and political rhetoric as authentic pro- 
grams goals (1973, p. 44) . 

Floden and Weiner (197H, p. 4) observed that because few goals 
have the strong support of a majority ol citizens, the goals written into 
legislation will be vague enough to permit different constituencies to 
read in their own goals. Further difficulties in measurement arise be- 
cause the outcomes of programs are obscured by the complex social 
context in' which they ocmr, and public goals often change during an 
evaluation process. 

Deutseher (1977) summarized previous research on Organizational 
behavior, noting that organizations are rarely what they pretend to be 
by virtue of their stated goals. He referenced specific theoretical con- 
cepts that were developed for describing the evolution of goals, in- 
cluding goal displacement and goal succession. 

Three wavs to avoid the goal trap dming program evaluation have 
been olfered b\ Weiss (1973): (I) \icw success in terms of process, 
(2) be attentive to the unintended, and (3) negotiate a realistic 
scenario. 
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Purposes and Process 



There are so many ways lo conduct program evaluation that 
guidance is needed from respected practitioners about how to take 
the right ad ion at the right time. In uddiritv to the selection of 
formal evaluation models, decisions also have to be made about im- 
plementation of process elements such as Who? What? and Why? 

Purposes 

Lent (1974. p. '2$) offers a useful trichotomy of purposes of program 
evaluation, program planning, program improvement, and program 
justification. Activities for program planning include an {..curate 
diagnosis of client needs, the identification of program supporters and 
opponents, and delineation of means for attaining desired goals. Pro- 
gram improvement activities include comparisons with similar pro- 
grams, improved communications among program participants, ob- 
servations of whether strategies are working as planned, and con- 
elusions about the adequacy of program responsiveness and flexibility. 
Activities viewed as falling under program justification may be listed 
as measuring die level of continued support for a program, discover- 
ing what supporters and opponents want to know about the program, 
demonstrating adherence to authorizing agreements, and advocating 
a fut me status (expansion, reduction, maintenance, or elimination of 
the program). 

Phaser of an Evaluation Process 

Harshimm (l!)7«>) examined many evaluation models or approaches 
taking note of similar rom|x>ncitt activities. Three phases of activities 
weie observed. First is the foundation phase that forms the basis for 
subsequent evaluation actions. Activities included in this first phase 
would he identihing the decisions to he made, determining the goals 
and values of import nice, and setting standards or establishing cri- 
teria. A second phase of activities, the information phase, consists of 
the specification, collection, analysis, and reporting of information. 
The judgments phase, which is the last, consists of three steps: a com- 
parison of the reports from the information phase with the standards 
s|m-c ilied in the foundations phase, an analysis of how and why various 
elements combine to produce an effect; and making recommendations 
(p 25). 

.14 



Process Particulars 

Minis (1978) provides an excellent discussion of options for design- 
ing a review process. In general, there are lour types of groups for 
producing a progiam review process design: administrators, (acuity, 
consultants, and a mixture of the other group*. A mixture group may 
be more useful and have greater siip|K>rt from the groups represented. 

Three sources oi ideas tor the design process were described: adop- 
tion intact from other sources, modification of a process used else- 
where (adaptation), and specially created processes. While adoption 
of a process would result in the shortest time before implementation 
and would permit the use of a proven process, its results may be least 
lilted to the needs of the institutional constituents. 

Minis also discussed several general characteristics of design 
processes. The design group can choose to operate in an open, con- 
sultative st\le or a relatively closed style. Some degree of openness is 
said to promote acceptance of the design and of the results (p. 5). All 
specifics of the review process can be spelled out prior to implementa- 
tion or some details can be detet mined later. The former method of 
planning is termed a complete, comprehensive design, while the latter 
is called an emergent or phased approach. An evolving design process 
is thought to be appropriate when the institution is undergoing rapid 
change, when there is genuine uncertainty as to how to proceed, and 
when the institution is not highly experienced in program review 
(Minis 1978, p. 5). 

Timing 

The timing and form of evaluation activities depends very much 
on the purpose of the evaluation. The three major purposes of evalua- 
tion are represented by needs assessment, formative evaluation, and 
summative evaluation (Ball 1979). Needs assessment provides data 
for deciding whether to start a program. Formative evaluation is de- 
signed to feedback to program managers to aid in program improve* 
ment through modification. It usually occurs during the initial im- 
plementation of a program. 

In contrast, summative evaluation is used lo/dctcrmine the overall 
worth of a program as the basis for expansion, maintenance, or ex- 
tinciion. It normally takes place after one complete cycle of program 
service has been completed and is frequently done by an outside 
evaluator. Stake offers a clever way to remember this sequence of 
evaluation: "When the cook tastes the soup, it is formative evalua- 
tion, and when the guests taste the soup it is summative" (1976, p. 19). 
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Although the formative/ summattve dichotomy has become almost 
a universal truth after its introduction by Striven in 1967, Striven 
has recently (1970) introduced the concept of prefonnative evaluation. 
Activities said to c harac terize this stage of evaluation are getting the 
evaluation budgeted and stalled, predicting probable program effects, 
and collecting baseline data. 

Smith and Sanders (1975, p. 4) describe a predevelopmental phase 
of formative evaluation which includes the logical and empirical 
analysis of needs; in contrast. Bloom, Hastings, and Madaus (1971, 
p. 91) separate a diagnosis phase of activity from formative evaluation. 
I hits lar, no common terminology is lound universally. 

Standing Committrr versus Ad Hoc Task Four 

Dressel (I97h) pointed out some problems of using a standing 
committee. It must rely on the same channels of communication for 
gathering information as it has used for all other tasks. Its members 
establish interpersonal ties they feel obligated to defend. To avoid 
this hitter difficulty, the State University of New York at Albany em- 
ployed onlv external consultants to review its graduate programs 
(Mingle 1978) On the other hand, the Academic Vice Chancellor of 
the University of Illinois at Urbana carefully balanced disciplines 
and perspectives of education in choosing committee members (Bras- 
kamp 1979). 

Comparative Approach 

Striven (I9t>7) distinguishes between comparative and noncompara- 
tive evaluation, choosing the compilative orientation since decisions 
frccpicnth have to he made among competing alternatives. He main- 
tains that it is enough to identify winch piogram can produce the 
greater elfects without e.i s plaining why one program worki bet- 
ter titan the other I his i usk Striven sees belonging to the edu- 
cational researcher. 

Whethei it is piogMtu peisonnel or external evalnatois that have 
begun to cinplot a comparative evaluation process, it is necessary to 
identify an appiopriatc domain ol peer institutions. Increasingly, the 
technicpies for making peer comparisons have become more systematic. 

Recently, in emploving a duster Mulytic model for grouping peer 
institutions lot reseaich and admin. *ive purposes, Tcrcn/ini et al. 
(1979) indicated iuipiovemeuts over r.irlici technicpies for classifying 
institutions hv reducing the arbitrariness ;md a priori specification of 
the classification siinctiile, as well as the inability to accommodate 
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more than a few classifies lion criteria. 1 he authors cautioned readers 
about the need to telatc aquations to environmental constraints, 
while avoiding inappropriate levelling of quality. In a very formal 
way. fniih Michigan and New York have incorporated the peer institu- 
tion into their state funding formulas (p. 21). 

Measures of Use 

Hutchinson describes three criteria for measuring the use oi evalua- 
tive data by dec ision makeis. The first criterion, completeness, is the 
percentage of the decisions of ,i given decision-maker that are made to 
some extent with the use of the evaluation data. The decision-maker 
is asked to maintain a log of what information, if any. is used in 
making a decision. For us. the second criterion, indicates the extent to 
which data were provided lor the most important decisions. Hence, 
the decision-maker is asked to designate on the decision/data log which 
are the more impoitant decisions. Finally, the third criterion, ef- 
ficiencv. refers to the percentage of the evaluation data used in making 
decisions. The use of predesigned data-gathcring techniques can he 
the source of very poor efficiency of evaluation data use. Such- 
man (I9»>7) stated that measures of ffinenry arise from the examina- 
tion of alternative program approaches :n terms of costs: money, time, 
personnel, and public convenience. 

Cost 

Cost effectiveness analysis emerges from the more general framework 
of uMt-lieitefit analysis. The comparison of the monetary value of 
benefits with the monetary value of costs provides a measure for 
assessing the relative attractiveness of alternatives (Levin 1975. p. 6). 
A basic assumption for employing cost benefit analysis is that the 
benefits ol a given program can he valued by their market prices. Un- 
fortunately, many social program outcomes do not have market values. 
In these many instances it is more important to relate costs to the 
actual physical or psychological outcomes rather than their monetary 
value (p. 9). Such a cost-ellcc tiveness approach can be modified later 
into a cost-benefit analysis afu-r experience demonstrates a market 
value for services and products. 

One additional and similar appioach to evaluation deserves men- 
tion. Com utilitx an.dxsis employs the dec ision maker's subjective views 
in valuing the out tomes of alternative strategies (Fisher 196-1, pp. S3- 
19). I bis technique is thought to he most useful where a complex set 
of outcomes is associated with each alternative course of action (Lif- 
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son l%8). Omi the decision maker determines the outcomes of alter- 
native policy approaches ami their utilities to him, these utilities arc 
related to costs and piobabiliiies of achieving the anticipated out- 
comes. The main objective becomes that of achieving the greatest 
utility within a fixed budget (cost). 

Levin (1975) provides a divussion and examples of how to de- 
termine the iosin ol ptogram components. He advises evaluators to 
expend effort to deteimiue costs in proportion to the anticipated 
magnitude of contribution diat the particular type of resource will 
make to the total tost of the program (p. 29) . Levin also demonstrates 
that there aie situations for which a marginal cost-effectiveness ap- 
proach is most appropriate. For example, it would be valuable in 
helping a dccision inakei choose between expanding an existing pro- 
gram or initialing a no i her similar one (p. 50). 

The speed with which one program rather than another produces 
desired results can be an important consideration (utility) for a de- 
cision maker, even in preference to a long-term greater effectiveness by 
another (slower) program. An appropriate discounting of value (utility) 
in accordance with the pieferetue [or speed is required in such cir- 
cumstances. 



ERLC 



18 



Pressing Problems 



Sanders (1979, p. I I) revicued seven c uncut evaluation primers on 
the basis of IS csscini.il topics. Those receiving the lightest coverage 
werelletermining value, maintaining ethical standauls. adjusting for 
external factors (political considerations), and evaluating evaluations. 
These situations and a lew other piessing problems appear to deserve 
the additional discussion that follows. 

Ethics and Standards 

The planning lor an evaluation should include an anticipation of 
potential conflicts among clients, cvaluators, and audiences. As a 
standard practice there is a need to specify: (1) which existing pro- 
gran, records will he examined; (2) what will he done with that col- 
lected data: and (3) who will have access to the completed evaluation 
report. Agreement should he reached on which sources of information 
are to remain anonymous and < ' at safeguards will he taken to in- 
sure such confidcntialitv. 

Understanding must also emerge on how much freedom cvaluators 
have to collect information beyond that specifically requested by the 
contracting organization Regardless of whether the latter conditio* is 
the subject of agreement, consensus is also needed on the extentHo 
which outsiders tan teview the evaluation findings. Such a consensus 
may be part of a statement about the right of the client to terminate 
an evaluation process. 

Anderson and Ball <I978) strongly advocate that evaluators should 
make their personal and professional value preferences clear. In the 
most useful way cvaluators discuss how their biases are reflected in 
the choice of elements within the research design and implementation. 
As an aid to evaluators. Anderson and Ball have prepared a table 
(pp. 122-123) which shows how the design, measurement, analysis, and 
interpretation of an evaluation can be altered by an evaluator s prefer- 
ence for any one side of up to seven different bipolar ways of viewing 
the scope of an evaluation: (I) phenomenological versus behavioristic: 
(2) absolutist versus comparative; (3) independent versus dependent; 
(4) programmatic versus theoretical; (. r >) narrow scope versus broad 
scope; (6) high intensive versus low intensive: (7) process versus 
product. Also, professional value differences frequently emerge from 
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the different disciplinary backgrounds of esaluators. For instance, a 
sociologist may have preferences lor context measurement as a pri- 
mary method for conducting an evaluation, while a psychologist may 
wish to focus on interpersonal or individual development. 

Anderson and Ball have prepared another table that spells out 
specific ethical responsibilities of both the cvaluator and the com- 
missioner of an evaluation during dillcrent phases of an evaluation 
process: making a lontract; fulfilling the contract with as little inter- 
ference of the program as possible; handling delicate extracontractual 
matters (unlawful practices, unsound activities); disseminating a bal- 
anced (objective) report; and guiding secondary evaluatort (pp. 150- 
152). 

In a background paper for the National Commission for the Pro- 
tection of Human Subjects of Biomedical and Behavioral Research, 
Campbell and Cecil (1977) assert that research in program evaluation, 
social experimentation, social-indicator research, survey research, sec- 
ondary anal>sis of research data, and statistical analysis of data from 
administrative records are and should be covered by Public Law 
93-548 and other rights of subjects legislation. In such situations, they 
recommend the use of a conditional clearance affidavit in lieu of a 
full review by the appropriate institutional review board in most 
cases. Further, the authors suggest that changes in data collection 
procedures would he reviewed, not changes in administrative policy 
implementation. Also advocated by the authors is extending the right 
of informed consent into these areas of research, plus informing re- 
spondents of the risks of verificational interviews and subpoena of 
information where these risks exist (p. 21). 

Another major focus of the ethics of program evaluation, centers on 
whether there should be professional standards for evaluators. Prior 
to discussing specific qualifications under consideration across the 
countrv. it is important to take note of objections raised against mak- 
ing use of any standards rime. Such objections have been summarized 
succinctly by the U.S Government Accounting Office (1978). In the 
first place, standards mav unduly restrict the supply of evaluators, 
driving up the cost of collecting evaluation data. Secondly, it has been 
hypothesized that the present time may be too early in the evolution 
of the evaluation profession to develop meaningful standards. In 
addition, an urging has hern made that evaluation be viewed more 
like the journalism profession, which is seldom restricted, rather than 
the more regulated profevsions of law. medicine, and accounting. 
Finally, it is observed that standards for the sufficiency of evidence 
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may restrict an evaluator's capacity to provide data in time for use 
in a decision (p. 22). 



S tandar ds 

In May 1974 the Joint Committee on Standards for Educational 
Evaluation began work on a volume to deal with issues and criteria 
for program and curriculum evaluation. This committee has 17 mem- 
bers representing a variety of educational organizations: the American 
Educational Research Association, American Psychological Associa- 
tion, National Council of Measurement in Education. American Per- 
sonnel and Guidance Association. American Association for School 
Administrators. Education Commission of the States. Association for 
Supervision and Curriculum Development, American Federation of 
Teachers. National Association of Elementary School Principals. Na- 
tional Education Association, and the National School Boards Asso- 
ciation. With a grant from the Lilly Endowment a nationwide panel 
of M experts was K ivcn a chance to discuss and write about various 
educational standards Then a seties of public hearings was held to 
permit wide opportunities for input and consideration. 

Although the committee had initially planned to prepare both a 
detailed and a condensed version of the standards, work has stopped 
on the simplified version. The detailed version will probably include 
for each standard: a rationale, a list of guidelines for meeting the 
standard, a list of pitfalls commonly encountered by inexperienced 
evaluators in meeting the standard, and potential conflicts between 
the given standard and others. The four major areas of focus are 
accuracy, utilitv. propriety and feasibility. The final standards should 
be published in 1980. 

Brce/inski (1979) used the draft standards for educational evalua- 
tion in analyn.ig eight evaluation reports, which had been nominated 
to Division H of the American Educational Research Association for 
consideration in their 1978 evaluation awards competition. Her self- 
admitted cursors review showed that seven of the thirty standards 
were quite evident (six of the seven were from the accuracy category). 
The seventh prominent standard was balanced reporting from the 
proprietary categorv References to an additional ten standards were 
discernible, mostlv from the accuracy category, but there were three 
from the utilits category and one from the feasibility category. Finally, 
she concluded that thirteen of the total thirty standards were not 
discernible in the evaluation reports examined. Four standards were 
from the utilitv categorv. seven were from the proprietary category 
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and two were from the feasibility category. Those reports displaying 
the gteatest attention to educational standards were the longest and 
most technical (p. 5). 

The Federal joint Dissemination Review Panel (J DRP ) has ttlab - 
1 idled 18 required considerations to determine quality educational 
products (Tallmadge 1977). Seveial of their primary considerations 
are defined by elements found in at least two of the aieas of the 
Standards for Educational Evaluation. For instance, this is true for 
educational significance, generaluability, and credibility. However, 
after reviewing the criteria stipulated for endorsement by the JDRP, 
Hopkins (ID77) expressed a strong reservation about their use. From 
an economic standpoint, it was thought unlikely that any sponsors 
would grant the time and funds needed to demonstrate satisfaction of 
all |I)RP standards. Questioned also was the vagueness of certain 
standards (e.«. how professional the materials look). 

Political Considerations 

Anderson and Hall (1975) provide a concise definition of the politics 
of evaluation. It refers io the contest of power and wills in the evalua- 
tion setting, aimed at getting credit for success or avoiding blame for 
failure of current programs (p. 282). They observe that several com- 
mon rationalizations may be given to diminish the value of doing an 
evaluation: the effects of the program arc long range and cannot be 
measured well in a short period of time; the effects of the program 
are not measured well by existing instruments because they are very 
complicated: the impart and effectiveness of the program cannot be 
proper Iv measured set because the individuals most in need of it are 
not vet participating: and only those people who have been intensely 
involved in the program can estimate its impact. 

Weiss (1975. p. 185) provide^ a vivid description of how different 
audiences await an evaluates'* data as tactics in an ongoing political 
struggle. Politicians are concerned with satisfying constituents and 
keeping politic alls advantageous programs alive* whether or not they 
accomplish their stated goals. 

In the fust issue of the newsletter of the Evaluation Research So- 
ciety, Cronhach (1977, p. I) declares that evaluation is first and fore- 
most a political activity. What is said to be needed is a political 
scientist to ask Laswellian questions: What is the motivation for 
setting up an evaluation? for agreeing to let evaluators collect data 
in one's school or community? for focusing upon, or ignoring, the 
evaluator s rej>ort? 
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Vst of Evaluative Data 

The extent to which inturiuaiioii is used to make decisions that will 
improve program performance is negligible according to several or- 
gainzaiional theories (March and Simon 1958; Cyert and March 196S; 
Steinbruiier 1971). They show that organizations search for new ideas 
and practices only when current performance falls below satisfactory 
levels, h must be further noted that the time required for evaluative 
information to influence decision makers is much longer than usually 
suggested (Colun and Garet 1975). 

Poulton (1978) discusses factors in the organizational environment 
that influence the use of program reviews. First, he considers man- 
ageable factors. The credibility and perceived openness of the process 
arc important Credibility is enhanced by the use of top-quality people 
and recommendations that can be generalized to several settings or 
programs (p. 12). Internal communications to program staff members 
that convey fairness, candor, and flexibility boost the use of evalua- 
tions. The early delineation of what roles different people will play 
m the evaluation piocess maximizes an understanding of the extent 
of involvement and support for an evaluation. 

Poulton .dso disused factors that were not manipulable by the 
evaluate*: how responsive given organizational units will be to par- 
ticipating in a program review, and whether the administrative climate 
is sufficiently stable and supportive to permit development and imple- 
mentation of the program review. 

The principal recommendations offered to guide evaluators in the 
preparation of useful repot ts focus on determining who the decision- 
makers will be. what information they will need, and when they will 
need that information. Morris and Fitz-Gibbon (1975) describe a 
process for figuring likely negative attitudes of the audience mem- 
l»ers and then adjusting the reporting process accordingly. For exam- 
ple, faculty meinbeis mav have the attitude that special meetings are 
a dram on their valuable planning time. Recommended adjustments 
include putting the information in a brochure, on the bulletin hoard, 
or appearing onlv at a regularlv scheduled faculty meeting (p. 30). 

A major suggestion for preparing a presentation scheme is the use 
of multiple formats. In addition to a detailed report, leaflct like 
summaries and oral presentations are also advocated. 

Within the same report, encouragement i$ R j V en to presenting data 
in diverse ways: verbal, numerical (tables), and graphical. When tables 
and graphs arc- to he employed they should be written prior to the 
narrative. Both tables and graphs should be designed to be self- 
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explanatory, since some readers may only look at them and not read 
the report. . ^ 0 

Morris and Fhz Gibbon (1975) provide readers with an entire chap- 
ter on how to prepare a variety of graphs and tables so that they are 
most easiiy understood bv audiences. Particularly for an oral report, 
the authors recommend starting with simple and large visuals. In some 
cases the indication of trend data as shown by plus-and-mintis signs 
may be easier to understand than actual data (p. 58). Tables are quite 
appropriate for showing the relationship among program components 
and the timeline for a project. 

Several suggestion* are made by Morris and Fitz-Gibbon (1975) on 
how to improve the teadability of the evaluation report: define tech- 
nical terms that are likely to be unfamiliar; use active verbs; use 
short sentences and paragraphs; and personalize the narrative when- 
ever |M>ssit>le (p. .15) . Popham advocated the inclusion of a verbatum 
transcript or anecdotal accounts of specific events to illustrate what 
activities aie tianspiiiug within the program (1975, p. 261) . 

The emplouueiit of adversary (contrasting) descriptions of the 
same program may help readers see more clearly the rqnge of advan- 
tages and disadvantages ol a given program than an "objective" view 
from one vie\v|x>int. For such an approach, one staff member is given 
the ies|K)nsihilit\ to paint a positive picture while another staff mem- 
ber would paint a negative picture. The detrimental effects of hav- 
ing writers of une<|nal writing skills can he softened by having one 
pel son edit both viewpoints or one person write both the positive and 
negative accounts This approach has been well exemplified by Stake 
and Cjerde (1975). 

Providing program stall with an opportunity to offer a rejoinder to 
the c\ ahtator's recommendations before they are made public, is the 
subject ol considerable debate. For oral presentations, using a co- 
presentet horn the agency being evaluated is a recommended tech- 
nique in lieu ol lacing hostile audiences. Involving the audience in 
the presentation also lower* the resistance ol people to new informa- 
tion. 
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Future Influences on Program Evaluation 



A spati.il loiiligiiratioii clestiibetl by Windle (1978) has consider* 
able mi l it as an aid in giaspmg the nil rent divergent thinking about 
program evaluation and us likely destinations in the future. He views 
the stojK? ol interests in progiam evaluation as three concentric circles. 
I he smallest scope ol inuiest is represented by viewing program 
evaluation as simpb icseauh to produce objective information others 
tan apply. A laiger spheie ol interest is rellective ol program values 
out those ol its stall. linalb. the largest sphere ol interest encompasses 
both smaller spheres ol iuteiest and also im hides three societal in- 
terests in seniles: ia\pa\ei acbocaty, citi/eu ad\oiaiy (protection and 
problem solution are wanted), and consumer advocacy (lair treat- 
ment, which woiks towaid independence) . 

Systematic Kesenrch ahntit Techniques 

l'npham (l!>7T>) desnibcs the ctitual need lor systematic research 
about the ellei tiuuess ol \arious evaluation techniques. For instance, 
he notes that theie is \ in nail y no empiric al evidence about the most 
ellective needs assessment approach 01 how to weigh criteria in reach- 
ing a summame judgment q>. .111). I he amount of interlerence with 
ongoing piogiain acti\ities also needs to be part of the investigations 
ol which techniques are r. ost ellective. 

(.ephail (I p. 2) recently declared that wc now have a critical 
mass ol people wm king on the explication of evaluation such that it 
no longer seems wildly optimistic to predict that wotk currently 
underway will mc ige bv die start ol the next decade to give us the ~ 
conceptual and methodological clarity that has been so elusive. 

Alternative Evaluative Criteria 

lo more lanl\ reflect the objectives and values ol unconventional 
progiams it is see n necessan to employ alternative criteria for evalua- 
tion, (.ooler ( l*J7*l) discusses one example, criteria that are particularly 
relevant to the uontiaclitioual delivery approaches, which he labels 
distance education, \ncss is said to leler to how many people and 
what kinds ol people ha\e progiam icsounes available to them. A 
second ciiteiiou would measuie the apptopi iateness ol the progiam to 
the needs and expectations ol taigel giotips served, (.ooler cautions 
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his reader that the relevancy is not static, since needs do change (p. 
47). The quality of program offerings is a third criterion. Included 
within this criterion is the logic of the product and its scope. A 
fourth criterion consists of learner outcomes, both intended and un- 
intended, including the attitudes that participants develop toward 
learning in general as well as toward the specific content covered. 
The impact of a program is the fifth criterion to be examined. It re- 
fers to the extent to which the program influences the mission, goals, 
and practices of other programs, institutions, or individuals. The sixth 
criterion is cost -effectiveness. It is important to measure the com- 
parative cost effectiveness of the program in relation to alternative uses 
of the same resources. Finally, the increase of our knowledge about 
the general field of delivering educational opportunities is also a 
criterion for evaluating such programs (p. 50). 

Taxpayer Advocacy 

Recent events in a number of states reflect increased taxpayer con- 
cern about the cost of government programs and their effectiveness. 
The shift of revenues for community colleges from local property 
taxes to state-level appropriations as a result of Proposition lS-type 
tax- relief measures mav result in greater scrutiny of institutional pro- 
grams by state officials. NfcCartan quotes the California Legislative 
Analyst as suggesing that Proposition^ changes may mean the re- 
evaluation of the state s policy toward oversight of the community 
colleges (p. 39). In fact, it is observed, the State Department of Fi- 
nance, rhe Legislative Analyst, and the California Postsecondary Edu- 
cation Commission have endorsed the practice of annual reviews of 
community college budget requests. 

The development and use of indicators of program administration 
represents a promising avenue for improving program evaluation. 
Sigelman (I97<i) suggested seven standards for evaluation of the quality 
of administration in the American state governments: professional 
quality as defined bv (I) expertise, (2) information processing ca- 
pacity (.S) innovativeness, (4) efficiency: and political quality as de- 
fined bv (5) representativeness, (6) partisan neutrality, and (7) in- 
tegrity These standards were derived from the research literature on 
public and private organizations. 

The cmbrsonii nature of research in this area is demonstrated by 
the fai t that Sigelman did not suggest any indicators for the standards 
of efficiency ami integrity. For two of the standards, innovativeness 
and partisan neutrality, he suggested only one indicator measure. All 
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uulicatois thai lepresent standards, int lucliiiK those multiple indica- 
tor* for expertise, inhumation processing capacity, and representative- 
new had not been examined lor validity^* leliahility. Obviously, 
much research lemaiiis to he done. 

Citizen Advocacy 

Quality assuiaiicc is a pun ess for providing citi/cn protection and 
reduced costs. It was Inst mandated by ledeial healthcare legislation. 
Woy et al. (1978, p. LSI) made a comparison of the contrasting 
emphases ol piogiam evaluation and quality assurance. Within the 
health context lor which the authors write, quality assurance (in con- 
trast to program evaluation) is generally patient/client specific, relies 
more on peer review (laihei than administrative review), employs 
consensual methods nailier than empirical and normative approaches) 
10 derive evaluative criteria, and has minimal aggregation ol data com- 
monly gathered through manual methods (rather than computer in- 
formation system*). 

In addition, the authors noted the gradual trend toward convergence 
ol qualits assuiame and program evaluation. Key ways this would 
happen include sharing data, participation of e valuators in the peer 
review ptoccss, and the input of a variety of professional staff to the 
program evaluation process (pp. 440-411). 

'Consupter Advocacy 

Cohen asset led that productivity in the human services has been 
achieved where set vice usnhs in empowerment of the individual. This 
txcuts when that cons unci is able to (1) establish and achieve ap- 
propriate purposes. (L } ) clarilv |>ctsoital values and deal with value 
issues. (S> elfectivelv understand him or herself as well as others, (4) 
negotiate and work through the systems which affect his or her life, 
and (3) develop and use needed skills ( IU78. p. 38). A citizen em- 
powerment chat t has been piepared by Cohen to facilitate the measur- 
ing of piogress towatd sell suilic ieitc y on the five dimensions of em- 
powerment (pp. 10 II). 

In an anaKsjs ol state government ac tion, (ungetal. (1977) showed 
how state licensing laws and regulations contrasted with the Fxlu- 
cation Commission ol the States model legislation for approval of 
postsecondat v educational institutions on a variety of facets; purpose, 
governance, and operation: course length, content, and objectives; de- 
gree requirements; stall qualifications; physical facilities; financial 
stabilitv; public disdosuie of materials: minimal qualifications for 
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entering students; ret riming practices; ieu>ri!'»..eping practices; refund 
policies; and placement. In most casts, a majority of states did not 
have regulation as ol January 1977. 

In summary, it appears tlfac there are many possibilities for student 
legal suits and /or federal and state intervention; however, concerted 
efforts to do program evaluation have heen widely evident only dur- 
ing the past decade. Now there is that critical mass of people and 
ideas horn many disciplines that is likely to forge worthy evaluation 
processes 
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Summary and Implications 



To facilitate conceptual clarity, this report began by distinguishing 
between research and evaluation. Broad generalizability of data is 
characteristic c>{ useaich, while immediate application to specific de- 
cisions is the overall purpose ol evaluation. 

Diverse ways of using evaluation can he considered by reflecting on 
the similes of evaluation discussed in this report. Both evaluators and 
those who request evaluation must he alert to the multiple purposes 
that can be attributed to an evaluation process. These individuals 
also need an understanding of the general approaches to conducting 
an evaluation (experimental, ecological, and eclectic) to achieve max- 
imum comprehension and adaptability. 

Since program evaluations have their greatest impact within the 
context of the budget process, an examination was made of the dif- 
ferent roles various budget approaches play in program evaluation. 
Although considerably more effort and imagination are required- to 
complete performance* or zero-base budgets than incremental or formu- 
la budgets, the investment of time pays off in more suitable docu- 
ments and processes lor .1 quality program evaluation. 

Another set ol perspectives on the conduct of program evaluation 
corresponds to institutional level: department, campus, multicampus 
office, coordinating agency lor higher education, and state legislative 
audit unit. Despite \ar\iug elements, common decisions about exact 
approaches can he found on each level; for instance, whether all pro- 
grams will be evaluated rather quickly or a sample more intensively. 
Also the dominance ol piogtam goals over the evaluation process must 
be determined at each le\el. Varying amounts of encouragement can 
be given to the cliscoven ol unanticipated outcomes in addition to the 
attainment of stated goals. 

Pioceclures lor conducting a program evaluation are selected to a 
huge clegtee In delei mining the purpose of the evaluation. Form does 
follow function, whether it he needs assessment, program adjustment, 
or progtam verdict. 

Superior evaluations result from attention to proper timing, the 
use of multiple iiummmo and diverse instruments, and consideration 
for potential ethical dilemmas, 

I he future ol piugram evaluation is said to he found in multiclisci- 
plinan effoits that serve (onsumcrs and ciii/cns. as well as program 
personnel ami legislator. 
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